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The genome organization and expression strategy of the newly identified 
severe acute respiratory syndrome coronavirus (SARS-CoV) were pre- 
dicted using recently published genome sequences- Fourteen putative 
open reading frames Were identified, 12 of which were predicted to be 
expressed from a nested set of eight subgenomic mRNAs. The synthesis 
of these mRNAs in SARS-CoV-infected cells was confirmed experimen- 
tally. The 4382- and 7073 amino acid residue SARS-CoV replicase poly- 
proteins are predicted to be cleaved into 16 subimits by two viral 
proteinases (bringing the total number of SARS-CoV proteins to 28). A 
phylogenetic analysis of the replicase gene, using a distantly related toro- 
virus as an outgroup, demonstrated that, despite a number of unique 
features, SARS-CoV is most closely related to group 2 coronaviruses. 
Distant homologs of cellular RNA processing enzymes were identified in 
group 2 corona viruses, with four of them being conserved in SARS-CoV. 
These newly recognized viral enzymes place the mechanism of corona- 
virus RNA synthesis in a completely new perspective. Furthermore, 
together with previously described viral enzymes, they will be important 
targets for the design of antiviral strategies aimed at controlling the 
further spread of SARS-CoV 
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introduction 

Severe acute respiratory syndrome (SARS) is a 
life-ttireatening form of at3rpical pneumonia that 
recently emerged in Guangdong Province, China. 
A previously unknown coronavirus was isolated 
from SARS patients''^ and is considered the cause 
of this emerging respiratory disease. In an extra- 
ordinary effort, the full-length genome sequence 
of the SARS-coronavirus (SARS-CoV) was eluci- 
dated within weeks after the identification of this 
novel pathogen and published by the Michael 
Smith Genome Sciences Center (Vancouver, 
Canada,"* Entrez Genomes accession number 
NC_004718 (AY274119)), the Centers for Disease 
Control and Prevention (Atlanta, USA,^ GenBank 
accession number AY278741), and others. The 
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SARS-CoV genome is --29.7 kb long and contains 
14 open reading frames (ORFs) flanked by 5' and 
3'-un translated regions of 265 and 342 nucleotides, 
respectively (Figure 3). Homologs of proteins 
conserved in all corona viruses are encoded by the 
overlapping ORFs la and lb, and by ORFs 2, 4, 5, 
6 and 9a (Figure 1; Tables 1 and 2). 

Corona viruses**-^ are enveloped, positive-stranded 
RNA (-hRNA) viruses, with a single-stranded 
genome of between 27 kb and 31.5 kb, the largest 
among known RNA viruses. The genomes of 
coronaviruses and related viruses in the order 
Nidovirales^"'' are polycistronic and are expressed 
through a sophisticated combination of poorly 
understood regulatory mechanisms.^" Coronavirus 
genome expression starts with the translation of 
two large replicase ORFs (la and lb; Figure 1), 
whose coding capacity is about twice that of the 
average complete +RNA virus genome. Via a -1 
ribosomal frameshift,^" the ORFla polyprotein 
(ppla; >4000 amino acid residues) can be 



extended with ORFlb-encoded sequences to yield 
a >7000 amino acid residue pplab pol3T5rotein. 
Replicase polyprotein processing is carried out by 
two or three ORFla-encoded viral proteinases.^^ 
The processing products are a group of largely 
uncharacterized (putative) replica tive enzymes, 
including an RNA-dependent RNA polymerase, 
an RNA helicase that is fused to a complex 
N-terminal Zn-finger, and a Zn-ribbon-containing 
papain-like proteinase.*^" '-^ The replicase subunits 
are thought to assemble into a viral replication 
complex that is targeted to cytoplasmic membranes 
by various membrane-associated viral proteins.'^"'* 
In addition to genome replication, the coronavirus 
replicase complex mediates the synthesis of an 
extensive nested set of subgenomic (sg) mRNAs 
(transcription) to express all ORFs downstream of 
ORFlb, which encode a variety of structural and 
accessory proteins .^'^ The number and composition 
of these 3'-proximal ORFs vary greatly among 
coronaviruses, but they always include genes for the 
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Figure 1. Overview of the SARS-CoV genome organization and expression. Comparison of the genome organiz- 
ations of SARS-CoV and bovine coronavirus (BCoV). The replicase genes are depicted, with ORFla, ORFlb, and ribo- 
somal frameshift site indicated- Arrows represent sites in tiie corresponding replicase polyproteins that are cleaved 
by papain-like proteinases (orange) or the 3C-like cysteine proteinase (blue). Qeavage products are provisionally 
numbered nspl-nspl6 (see also Table 1). In the 3'-terminal part of the genomes, homologous structural protein genes 
are indicated in matching colors. Close-ups of two regions with major differences are shown (and see the text). In the 
N-terminal half of replicase ORFla, SARS-CoV lacks one of the PLp"* domains (indicated in orange/green in BCoV) 
and contains a unique insertion (SUD). In the region with structural and accessory protein genes, the location of the 
body TRSs involved in subgenomic RNA synthesis are indicated with red boxes (see Figure 3 and Hofrnann et al7% 
The bottom part of the Figure illustrates which parts of the genome are conserved in the genus Coromwirus and in 
the order Nidovirales (the ORFla sequence of toroviruses, which largely remains to be sequenced, could not be 
included). Furthermore, it is indicated for which domains homologs have been identified in other RNA viruses and 
the cellular world. Enzymes for which structural data are available are shown in blue. SUD, SARS-CoV unique 
domain; PLp™, papainlike cysteine proteinase; SCL^"", 3C-like cysteine proteinase; TM, transmembrane domain; 
ADRP, adenosine diphosphate-ribose 1 "-phosphatase; ExoN, 3'-to-5' exonuclease; CLp"', chymotr>'psin-like proteinase; 
RdRp, RNA-dependent RNA polymerase; HELl, superfamily 1 helicase; XendoU, (homolog of) poly(U)-specific 
endoribonuclease; 2'-0-MT, S-adenosylmethionine-dependent ribose 2'-0-methyl transferase; CPD, cyclic phospho- 
diesterase- Domains Ac, X, and Yare described by Ziebuhr et alP^ and Gorbalenya et al*' 
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Table 1. Predicted SARS-CoV replicase cleavage products and their mode of expression 



Protein order* in poiy- 
proteins ppla/pplab 



Position in polyproteins 

ppla/pplab (amino add 
residues)** 



nspl-ppla/pplab 
nsp2-ppla/pplab 
nsp3*-ppla//pplab 

nsp4-ppla/pplab 

nsp5-ppla/pplab 

nsp6-ppla/pplab 

nsp7-ppla/pplab 

nspS-ppla/pplab 

nsp9-ppla/pplab 

nsplO-ppla/pplab 

nspll-ppla 

nspl2-pplab 

nspl3-pplab 

nspl4-pplab 

nspl5-pplab 

nspl6-pplab 



lMet-Glyl80 
181Ala-Gly818 
819Ala-Gly2740 

2741Lys-Gln3240 
3241Ser-Gln3546 
3547Gly-Gln3836 
3837Ser-Gln3919 
3920Ala-Gln41l7 
4118Asn-Gln4230 
4231Ala-an4369 
4370Ser-Val4382 
4370Ser-Gln5301 
5302Ala-Ghi5902 
5903Ala-an6429 

6430Ser-Gln6775 

6776Ala-Asn7073 



Protein size 
(amino acid 
residues) 


Associated putative 
functional domain(s)' 


Predicted mode of expression 
and release from 
polyproteins** 
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Predictions are based on the SARS-CoV sequences published by Michael Smith Genome Sciences Centre (Vancouver, Canada; 
Entrez Genomes accession number NC^004718 (AY274119)^) and the Centers for Disease Control and Prevention (Atlanta, USA; Gen- 
Bank accession number AY278741^) and an alignment of SARS-CoV with previously characterized coronavirus sequences as 
summarized in Refs. 11,1832. . v , j- • 

* For convenience, replicase cleavage products were provisionally numbered non-structural protein (nsp) 1-16 according to their 

position in the polyproteins, , . . . ^ -t. i 

^ Amino adds of repUcase proteins ppla and pplab were numbered assuming that, as m other coronavmises, a -1 nbosomal 
frameshift occurs; use of the sUppery sequence UUU AAAC^ is predicted to yield a peptide bond behveen Asn4378 and Arg4379 m 

^^'^Abbreviations- PL2P"', papain-like proteinase 2; ADRP, adenosine diphosphate-ribose ^-phosphatase; TM, transmranbrane 
domain; 3CL»^, 3C-like cysteine proteinase; GFL, growth factor-Uke domain; RdRp, RNA-dependent RNA polymerase; ZD, putative 
Zinc-binding domain; HELl, superfamily 1 helicase; NTD, nidovirus conserved domain; ExoN, 3'-to-5' exonudease; 2'-OMX S-ade- 
nosylmethionine-dependent ribose 2'-0-methyltransf erase. Domains Ac, X, and Y are described in Refs 32 and 47. ' 

Indicated are the SARS-CoV proteinases predicted to be involved in deavage of the N- and/or C-termini of ttie cleavage 
products- TI, translation initiation; TT, translation termination; RFS, ORFla/ORFlb nbosomal frameshift 

Compared to the corresponding deavage product of BCoV (see Figure 1), nsp3 lacks PL1f~ and contains a -375 amino acid 
insertion between the X and PL2p~ domains which is unique for SAKS-CoV (see also Figure 1). 



Structural proteins S, M, E and N, which drive cyto- 
plasmic virus assembly. The mechanisms underlying 
the synthesis of genomic and subgenomic RNAs are 
poorly imderstood. To explain the composite struc- 
ture of the sg mRNAs, which are both 5' and 3'- 
coterminal with the viral genome, several models 
have been put forward/ * of which the one postulat- 
ing the discontinuous synthesis of negative-stranded 
sg templates for sg mRNA synthesis'' has received 
wide support recently. > 

On the basis of antigenic cross-reactivity, corona- 
viruses were originally classified into three groups 
(termed groups 1, 2, and 3). Subsequently, the 
phylogeny-based clustering of coronaviruses 
proved at first (almost) identical with that based 
on antigenic cross-reactivity.^" The same tiiree 
clusters were evident upon analysis of the, replicase 
region^^" which does not contribute to virion anti- 
genicity. This indicated that different regions of the 
coronavirus genome have indeed co-evolved and 
that intergroup recombination has not played a 
prominent role in coronavirus evolution.^ How- 
ever, the agreement between the two classifications 
is not perfect, as some coronaviruses are suffi- 
ciently different to not have antigenic cross- 



reactivity with the established groupsr^ but close 
enough to cluster with one of them (group 1) on 
the basis of sequence comparisons/ Consequently, 
these viruses were placed into (the expanded) 
group 1. Here, we refer to coronavirus groups as 
evolutionary clusters that unite viruses not necess- 
arily having antigenic cross-reactivity. 

Using the recently published SARS-CoV genome 
sequences,*'^ we provide insight into the evolution, 
organization and expression of SARS-CoV. The 
SARS-CoV genome and proteome were compared 
with those of other coronaviruses, distantly related 
nidoviruses, and databases, and several of our 
predictions were verified experimentally. 



Results and Discussion 

SARS-CoV represents a lineage that has split 
off from the group 2 branch relatively late in 
coronavirus evolution 

To optimize our understanding of the SARS-CoV 
genome, we sought to infer the phylogenetic 
position of the novel agent relative to known 



BNSDOC1D: <XP ^4447471A_L> 



994 



Table 2. Predicted SARS-CoV proteins expressed from 
subgenomic mRNAs 2 to 9 





Protein size 


Subj;enomic mRNA 


Protein 


ORF 


(amino acid 


predicted to be used 


name/ 


number* 


residues) 


lor expression^ 


function 


2 






bpike (d; 
protein 


oa 
















4 


76 


4 


Envelope (E) 
protein 


5 


221 


5 


Membrane 
(M) protein 


6 


63 




7 


7a 


122 




7 


7b' 


44 




7 


Sa" 


39 




"> 


8b 


84 


h 




9a 


422 


M 


Nucleixapsid 
(N) protein 


9b" 


98 


H 


7 



Predictions are based on the SARS-CoV sequences published 
by ^4ichael Smith Genome Sciences Centre (Vancouver, Canada; 
Entrez Genomes accession number NC_0a471S {AY27-I119)') and 
the Centers for Disease Control and Prevention (Atlanta, USA; 
GenBank accession number AY27874 1' \. 

* See also Figures 1 and 3. 

^ ORF3b (462 nucleotides) overlaps with the 3* half of ORF3a, 
the RNA4 body TRS and the 5' end of ORF4. It is the fifth largest 
reading frame downstream of ORFlb tatter ORFs 2, 3a, 5 and 9a) 
making it a likely candidate to be expressed. Since its translation 
initiation codon is the 13th AUG codon in mRNA3, ORF3b 
expression should involve a mechanism like internal ribosomal 
entry (as previously suggested for some other coronavirus 
ORFs; Ref. 78) or the synthesis of an as yet undetected 
additional subgenomic mRNA. 

* The translation termination codon of ORF7a and translation 
initiation codon of ORFTb overlap. The absence of any other 
upstream AUG codons (with the exception of that of ORF7a) 
and good context for translation initiation of the ORFTb AUG 
codon suggest that ORF7b may be expressed from subgenomic 
RNA7 by "leaky scanning" of ribosomes. 

The putative ORFSa start codon is in a good context for 
translation initiation and immediately foUov^s the body TRS 
involved in mRNA8 transcription, making it likely that ORFSa 
is expressed frx>m mRNA8. The mechanism used to express the 
larger dov^^nstream ORFSb is more puzzling, since its (putative) 
translation initiation codon appears to have a poor context for 
translation initiation and two additional AUG codon are present 
in the region between the putative start codons of ORFs 8a and 
8b. Recently, some SARS-CoV isolates from human and civet 
cat origin (L.L.M.P. and Y.G., unpublished results) were 
reported to contain a 29 nudeotides insertion that results in the 
in-frame fusion of ORFs 8a and 8b. Consequently, ORF8b in the 
Frankfurt-1 and HKU-39849 isolates used in this study may be 
translationally silent. 

* A functional "internal" open reading frame, overlapping 
with the N protein gene^has been described for other group 2 
coronavirus, e.g. BCoV;" ORP9b appears to occupy a corre- 
sponding position and may be expressed following "leak)' 
scanning" by libosomes. 



coronaviruses. Recent phylogenetic analyses of 
different SARS-CoV proteins using unrooted trees 
consistently showed that SARS-CoV does not seg- 
regate into any of the three currently established 
coronavirus groups.'*'^ These results were inter- 
preted as support for tihe classification of SARS- 
CoV as the prototype of a novel, fourth group of 
coronaviruses.^'-'^ However, in our opinion, the 
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evidence leading to this conclusion was incon- 
clusive and alternative interpretations, with SARS- 
CoV being an outlier in one of the established 
groups, remained possible. This uncertainty can 
be resolved only through the reconstruction of 
coronavirus evolution from its origin using a 
rooted phylogenetic tree, which is most reliable 
when an outgroup is included in the analysis. The 
closest known outgroup for coronaviruses are 
the toroviruses, which form a separate genus in 
the same virus family.^^ The ORFlb part of the 
replicase and the two virion proteins S and M are 
homologous in coronavinises and toroviruses.^"^*^ 
Unfortunately however, the level of conservation 
of title S and M protein genes is so low that we con- 
sider only the phylogenetic analysis of replicase 
ORFlb to be truly informative. 

Consequently, to resolve the phylogenetic pos- 
ition of SARS-CoV, the equine torovirus (EToV^) 
was included in our analysis, which was limited 
to replicase ORFlb,^^ the most conserved part of 
the genome. It should be noted, however, that the 
size of this genome segment (^5500 nucleotides) 
approximates the combined size of the genes 
encoding the four virion-associated proteins S, M, 
E, and N. A fxilly resolved tree was obtained, with 
all brcinches supported in more than 960 out of 
1000 bootstrap trials (Figure 2). The topology of 
this tree suggests strongly that ihe SARS-CoV line- 
age was an early split-off from the group 2 branch, 
which occurred after the two bifurcations that gave 
rise to the three major coronavirus groups 
(Figure 2). Accordingly, in two regions of the repli- 
case ORFla polyprotein, nspl and one of the nsp3 
domains, which differentiate the three coronavirus 
groups, SARS-CoV contains orthologs of domains 
fiiat are unique for group 2 coronaviruses (see 
Figure SI of the Supplementary Material). The 
published unrooted trees for the virion proteins 
and 3CLP"* are also compatible with ttiis 
phylogeny'^'"'^ although formally we cannot exclude 
the occurrence of recombination with other corona- 
viruses in very limited regions. In this respect, we 
would like to stress that the differences in tiie com- 
position and arrangement of ORFs in the 3'-proxi- 
mal region of the genome (downstream of ORFlb; 
see Figure 1) betweei SARS-CoV and established 
group 2 coronaviruses does not contradict the 
above results. Group 1 coronaviruses also differ in 
this region through the presence of unique so- 
called "accessory non-structural protein genes 
Some of these genes have been found to be 
dispensable for virus reproduction in tissue culture 
and/or animals.^'^-^ The fact that, apparently, they 
can be acquired or lost easily in ihe course of 
evolution indicates that these genes can not be con- 
sidered reliable group markers. 

In conclusion, SARS-CoV is distantly related to 
established group 2 coronaviruses, a relationship 
comparable to that observed in group 1 between 
porcine epidemic diarrhoea coronavirus (PEDV) 
and human coronavirus 229E (HCoV-229E) on 
the one hand, and transmissible gastroenteritis 
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Figure 2. Phylogenetic analysis of coronavirus 
replicase genes. SARS-CoV replicase ORFlb amino acid 
sequences (Entrez Genomes accession number 
NC_004718 (AY274119)) were compared with those from 
viruses representing the three coronavirus subgroups 
and the genus Torovirus, Group 1: transmissible gastro- 
enteritis virus (TGEV), NC_002306; human coronavirus 
229E (HCoV-229E), NC_002645; porcine epidemic 
diarrhea virus (PEDV), NC_003436. Group 2: mouse 
hepatids virus A59 (MHV-A59), NC_001846; bovine cor- 
onavirus (BCoV-Lun) AF39154Z Group 3: infectious 
bronchitis virus (IBV), strains Beaudette (NC_001451) 
and LX4 (AY223860). Torovirus: equine torovirus 
(EToV), X52374. A multiple protein alignment of these 
sequences was generated with the help of the Clus- 
talXl.82 program*-^ and was adjusted manually. Two 
regions of poor conservation were removed from the 
alignment, which was converted subsequently into the 
nucleotide form. All columns containing gaps were 
removed. The resulting alignment contains die following 
SAFS-CoV sequences fused: 13,623-13^9, 14310- 
18,857 and 20,076-21,482. It included 5487 characters 
with 3207 of them being parsimony-informative. Using 
the PAUP program (version 4.0.0d55) and parsimony cri- 
terion, an exhaustive tree search of the 135,135 evaluated 
trees identified the best tree having a score of 10,927 and 
the second best tree having a score of 10,964; the worst 
tree had a score of 13,611. A total of 1000 bootstrap trials 
were conducted using the parsimony criterion and a 
bra nch-and -bound search to generate a bootstrap 50% 
majority-rule consensus tree. The frequency of occur- 
rence of particular bifurcations in bootstraps is indicated 
at the nodes. Similar trees with similar high bootstrap 
support above 960 were obtained using the NJ method 
that was applied to distance matrices obtained for either 
nucleotide or amino acid alignments (not shown). 



corona virus (TGEV) and related viruses on the 
other hand (Figure 2). Accordingly, tiie lack of anti- 
genic cross-reactivity observed between distant 
group-mates in group 1"^ may be observed 
between SARS-CoV and the established group 2 
viruses. Thus, SARS-CoV may be the first identi- 



fied representative of a larger cluster that could be 
called subgroup 2b, if the established group 2 coro- - 
naviruses v^ould be referred to as subgroup 2a. Tlie 
2b cluster should include the immediate ancestor 
of SARS-CoV, which may circulate in the field. If 
close relatives of SARS-CoV were to be identified 
in animal hosts, the virus would represent ^he 
second example of a group 2 coronavirus that 
may have crossed the animal -human barrier. The 
first putative case is that of the bovine coronavirus 
(BCoV) and human coronavirus OC43 (HCoV- 
OC43),^° two viruses that are so closely related at 
the genetic level^^-^* that they can be considered to 
be the same virus species. 

Two proteinases are predicted to cleave the 
SARS-CoV replicase polyproteins Into 16 
subunits, the largest of these having a unique 
domain organization 

A detailed comparison of the SARS-CoV repli- 
case with that of its closest known relatives in 
group 2, mouse hepatitis coronavirus (MHV) and 
BCoV (Figure 1), revealed a replicase proteolytic 
processing scheme and domain organization that, 
with some notable exceptions (see below), proved 
to be typical for group 2 viruses."*^" Using the con- 
served signatures of the cleavage sites recognized 
by coronavin^ proteinases" ^^^^ and their flank- 
ing sequences, we predict the generation of 16 
replicase subunits through proteolysis mediated 
by SCLP"* (11 cleavages) and PL2p™ (three 
cleavages) (Figure 1 and Table 1)» 

The most conspicuous differences between 
known group 2 coronaviruses and SARS-CoV 
were identified in nsp3, the largest replicase sub- 
unit that is encoded by ORFla (Table 1). Unlike all 
other coronaviruses, SARS-CoV does not have an 
ortholog of papain-like proteinase 1 (PLl^™; see 
close-up in Figiue l),^^"^^ which was probably lost 
during evolution of this lineage. This observation 
implies that the three cleavages in the N^terminal 
half of ppla must all be performed by the con- 
served FLZP™, -^'*' a downstream-located paralog of 
FLIP"'. The ortholog of this proteinase appears to 
dominate over PL1p~ in HCoV-229E,^= and is the 
only active PLp~ in avian infectious bronchitis cor- 
onavirus (IBV)/^^^ Immediately upstream of 
PI^P"", we identified a 375 amino acid residue 
"orphan domain" in SARS-CoV (called SUD for 
SARS-CoV unique domain; Figiire 1), which is not 
present in other coronaviruses. The corresponding 
ORFla region differs profoundly among group 1 
coronaviruses. In one of these viruses (TGEV), and 
in the group 3 IBV, this region contains just a few 
amino acid residues, essentially fusing PL2P"' to 
the upstream X domain. In contrast, HCoV-229E 
and PEDV share a conserved domain in this 
position. Interestingly, nsp3 also was the main site 
of replicase differences between BCoV variants iso- 
lated from respiratory and intestinal samples from 
an animal that had died during an outbreak of 
fatal shipping pneumonia.^ Due to the plausible 
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Figure 3. SARS-CoV subgenomic mRNA synthesis. (A ) Orgam2ation of ORFs in the 3' end of the SARS-CoV genome 
with predicted leader and body TRSs indicated by small boxes. The subgenomic mRNAs resulting from the use of 
these TRSs for leader-to-body fusion are depicted below, with mRNAs predicted to be functionally bidstronic 
indicated with an asterisk ( * ). (B) Hybridization analysis of intracellular viral RNA from Vero cells infected with 
SARS-CoV, Frankfurt-1 (Fr) and HKU-39849 (HK) isolates. See Materials and Methods for technical details. Oligo- 
nucleotides complementary to sequences from the SARS-CoV leader sequence and to a region in the genomic 3' end 
both recognized a set of nine RNA species (the genome (RNAl) and eight subgenomic RNAs) confirming the presence 
of common & and 3' sequences. RNA from Vero cells infected with avian infectious bronchitis virus (IBV), which 
produces only five subgenomic mRNAs of known sizes'*' was run in the same gel and used as a size marker. 
(C) Model for nidovirus subgenomic RNA synthesis by discontinuous extension of minus strands.^^-^ Whereas 
genome replication relies on continuous minus strand synthesis (antigenome), subgenomic minus strands would be 
produced by attenuation of nascent strand synthesis at a body TRS (red bar), followed by translocation of the nascent 
strand to the leader TRS in the genomic template. Following base-pairing between the body TRS complement at the 
3' end of the minus strand and the leader TRS, RNA synthesis would resume to complete the subgenomic minus 
strand that would then serve as template for the transcription of subgenomic mRNAs. 



multifunctionality of nsp3, which may be involved 
in the control of subgenomic mRNA synthesis, ^^^^ 
the gross internal rearrangements and point 
mutations in this protein may have pleiotropic 
effect(s) on SAlRS-CoV properties, including its 
pathogenic potential. 

SARS-CoV produces eight subgenomic 
mRNAs to express the ORFs located in the 
3'-proximal part of the genome 

In a striking parallel with the unique features of 
nsp3, the 3'-proximal part of the SARS-CoV 
gaiome contains five ORFs (6, 7a, 7b, 8a and 8b) 
Siat are not present in established group 2 corona- 
viruses and for which no obvious homologs could 
be identified upon sequence comparison. Further- 
more, SARS-CoV lacks counterparts for two genes 
inserted between replicase ORFlb and the S gene 
in subgroup 2a viruses (see the close-yp in 
Figure 1).^- All these ORFs (from 2 to 9b) are pre- 
dicted to be expressed from sg mRNAs in SARS- 
CoV. In members of the genus Coronavirus and the 
related family Arteriviridae, all sg mRNAs are 3'- 
coterminal with the viral genome, and contain a 
common 5' leader sequence that is identical with 
that of the genome.'^ '-^^^ The fusion of the leader 
to the coding part (or "body") of each of the sg 



RNAs involves a discontinuous step in RNA syn- 
thesis, which is currently believed to occur during 
minus strand synthesis, thus producing composite 
subgenomic negative-stranded templates for sg 
mRNA synthesis (Figure 3(C))- ^''-"^*«^ Leader-to- 
body joining is guided by a base-pairing 
interaction involving conserved transcription- 
regulating sequences (TRSs; also previously 
termed "intergenic sequences (IGSs)" in corona- 
viruses), which are found at the 3' end of the 
genomic leader (leader TRS) and at the 5' md of 
each of the sg RNA bodies (body TRSs), often 
located exactly between two genes, but sometimes 
located within the coding sequence of an upstream 
gene (Figures 1 and 3(A)). 

In the SARS-CoV genome we readily identified a 
potential leader TRS (5'-CUAAACGAACUUU-30 
that has a 6-11 nucleotides match with a number 
of sequences in the 3' end of the genome, many of 
which are positioned immediately upstream of 
viral genes (Figure 3(A)). As recognized also by 
others,^^-"^ the sequence 5'-ACGAAC-3' is 
absolutely conserved and can be considered the 
core of the SARS-CoV TRS. Based on the SARS- 
CoV sequence with the largest 5'-terminal segment 
(accession number AY278741-), the SARS-CoV 
leader sequence is (at least) 72 nucleotides long, 
similar to e.g. that of BCoV, with which it has a 
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striking 20 out of 21 nucleotides match immedi- 
ately upstream of the leader TRS (5'-GAUCUCXJU 
GUAGAUCUGUUC-3'). On tlie basis of the 
location of putative body TRSs, the synthesis of 
nine mRNAs by SARS-CoV was expected: the 
genomic mRNA (RNAl) and eight subgenomic 
mRNAs with sizes of approximately 8.4, 4.6, 3.8, 
3.5, 3.0, 2.6, 2.1 and 1.8 kb (including 5' leader and 
3' poly(A)-tail). However, in the first published 
experimental analysis of the SARS-CoV-specific 
mRNAs generated in infected Vero cells, the syn- 
thesis of only five viral mRNAs could be 
confirmed.^ 

To investigate SARS-CoV RNA synthesis in more 
detail, Vero cells were infected with SARS-CoV 
isolates Frankfurt-1^ and HKU-39849,^ and intra- 
cellular RNA was analyzed by hybridization with 
oligonucleotide probes complementary to a part of 
the 5' leader sequence and a sequence just 
upstream of the 3' poly(A) tail. The coronavirus 
IBV,"*' which also replicates in Vero cells, was used 
as control and size marker. As illustrated in 
Figure 3(B), the genomic RNA and all eight pre- 
dicted subgenomic transcripts were detected with 
both SARS-CoV probes, confirming the fact that 
these RNAs contain both common 5'-temiinal and 
common 3'-terminal sequences. Remarkably, a 
slight mobility shift was observed for RNAs 7 and 
larger of the Frankfurt-1 isolate. The subsequent 
sequence analysis of this virus revealed that this 
was due to a 45 nt in- frame deletion in ORFTb,^ 
probably the first documented example of SARS- 
CoV genetic adaptation to cell culture conditions. 
The confirmation of leader-body fusion isites of the 
SARS-CoV subgenomic mRNAs will be published 
elsewhere.^ Remarkably, up to four of the eight 
SARS-CoV subgenomic mRNAs (3, 7, 8, and 9) 
may be hinctionally bicistronic (Table 2), as 
observed occasionally for other coronavirus sub- 
genomic mRNAs .^'^ 

The replicase of coronaviruses includes a 
variety of putative RNA-processIng enzymes 

The production of a complex and diverse set of 
RNA molecules by nidoviruses (including SARS- 
CoV) is linked to an unparalleled complexity of 
their giant replicase, which contains a variety of 
(putative) enzymatic functions and a number of 
completely uncharacterized domains (Figure 1).*® 
We have initiated the characterization of corona- 
virus replicase by comparative genomics,^^ and 
have regularly updated this analysis through 
recent years). ^^'^ Our continuing analysis has now 
identified distant coronavirus homologs of not less 
than five cellular enzymes that are associated with 
RNA processing (Figtire 4): polv(U)-specific endo- 
ribonudease (XendoU"*^), a 3 -to-5' exonuclease 
(ExoN) that belongs to the DEDD superfamily,'*^ 
S-adenosylmethionine-dependent ribose 2'-0- 
methyltransferase (2'-0-MT) of the RrmJ family,*" 
adenosine diphosphate-ribose l"-phosphatase 
(ADRP^'^), and cyclic phosphodiesterase (CPD)/'**^ 



In the SARS-CoV proteome, conserved domains 
presixmably associated with these activities were 
mapped (from the N to C terminus) to the X 
domain"*" of nsp3 (ADRP), the N-terminal domain 
of nspl4 (ExoN), a "nidovirus-specific" replicase 
domain^** in the C-terminal part of nspl5 
(XendoU), and nspl6 (2'-0-MT). The CPD-related 
domain is not conserved in SARS-CoV, but was 
identified in the product of ORFZ"*^ of established 
group 2 coronaviruses, and in the very C-temiinal 
domain of the torovims ORFla polyprotein,^*^ as 
well as in some double-stranded RNA rotaviruses. 

The conservation in the ExoN, 2'-0-MT and 
CPD-related domains of nidoviruses includes the 
catalytic and other active-site residues identified 
in the prototype cellular enz3mies. Although the 
active-site residues of the ADRP and XendoU 
families are yet to be characterized, the most con- 
served amino acids of these families are found in 
their putative nidovirus homologs. Some of the 
nidovirus domains may contain unique and con- ^ 
served additional domains. For instance, we noted 
that the nidovirus ExoN homologs contain an 
additional conserved domain resembling a mono- 
nuclear Zn-finger (Figure 4(B)) between the univer- 
sally conserved blocks I and n, which include the 
catalytic residues (two Asp and one Glu).^^ 
Another Zn-finger-like module has been inserted 
between blocks II and HI in the ExoN homolog of 
roniviruses, a subset of nidoviruses (data not 
shov^). Our combined observations indicate that 
the rudovirus homologs of these cellular RNA pro- 
cessing enzymes must be, enzymadcally active, 
although they may have evolved to act on specific 
(and unique) substrates or have additional unique 
components. 

The newly predicted enzymes could be involved 
in the metabolism of virus and /or cellular RNAs. 
For instance, the 2'-0-MT activity could be used to 
produce the 5'-cap of viral mRNAs, as was demon- 
strated for a homologous flavivirus enzyme."^^ 
Based on a parallel with some cellular DNA- 
processing homologs, like exonuclease I^ and the 
exonuclease domain of DNA polymerases,'^* it is 
tempting to speculate on a link between the ExoN 
activity and RNA proofreading, repair, and /or 
recombination. The first two activities are not 
known in RNA viruses, and recombination com- 
monly proceeds through the copy-choice 
mechanism with RdRp switching templates to 
produce chimeric nascent chains.^'' However, due 
to the extreme sizes of their giant genomes, corona- 
viruses may differ from other RISFA viruses and 
share an unprecedented similarity with DNA- 
based life-forms in the mechanisms of genome bio- 
synthesis and maintenance. If confirmed, these 
unusual properties would explain the preliminary 
reports on the resistance of SARS-CoV to ribavirin, 
a drug that was shown to force other RNA viruses 
into "error catastrophe".*^^ The experimental verifi- 
cation of these predictions will be an important 
step in increasing our understanding of the func- 
tional roles these putative enzymes play in the 
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Figure 4. Sequence alignments of protein families that include cellular enzymes involved in RNA processing and 
their nido virus homologs. Our in-depth comparative sequence analysis (see Materials and Methods) revealed a 
statistically significant relationship between functionally uncharacterized proteins (domains) of nidoviruses, including 
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replicative cycle of SARS^CoV and related viruses. 
Exterisive attempts to demonstrate the 2'-0-MT 
activity of several corona viruses (whicli was also 
recently predicted by others'*^ in a 5'-RNA-capping 
reaction have not produced conclusive evidence so 
far Q.Z. and A.E.G-, unpublished results). This 
development indicates that, as before with other 
distant nidovirus homologs (e.g. the helicase)/'' 
the translation of bioinformatics predictions into a 
functional description is likely to be a laborious 
and time-consuming process, involving mainly the 
identification of virus-specific substrates and 
proper assay conditions. 

In this respect, we have made an observation 
that both provides additional support for the pro- 
visional assignments made above and may help in 
the experimental verification of the predicted 
activities. When the five enzyme families listed 
(Figure 4) above were analyzed as a single dataset, 
it fcecame apparent that representatives of these 
families cooperate in two nuclear intron RNA pro- 
cessing paithways. These pathways are functionally 
antagonistic: intron excision aimed at the synthesis 
of mature tRNA"^^ and the production of intron- 
encoded box C/D small nucleolar RNA (snoRNA) 
from its host pre-mRNA"^^ (Figiire 5(A)). In the 



first pathway, XendoU initiates a cascade of poorly 
characterized endo- and exonuclease reactions that 
may involve ExoN, a homolog of tlie yeast Rrp6p 
exosome component,^"^ ultimately leading to the 
production of mature U16 and U86 snoRNAs. Sub- 
sequently, these snoRNAs may be utilized in 
diverse rRNA processing events involving nucleo- 
tide methylation by fibrillarin, a Z'-O-MT,"^' and 
assisted by helicase(s).^^ Strikingly, the homologs 
of three cellular enzymes from this pathway, 
encoded in the replicases of ail nidoviruses except 
for arteriviruses, are genetically clustered in a 
single protein block (nspl4-nspl6) immediately 
downstream of the RNA-helicase (nspl3) 
(Figures 1 and 4). Because of the proximity of 
these four domains to each other, their expression 
must be tightly coordinated at the level of SCLp"" 
proteolysis and by the upstream ORFla/ORFlb 
ribosbmal frameshift signal. 

In the other pathway, which involves tRNA-pro- 
cessing, the utilization of a 2'-phosphate group of 
a splicing intermediate involves the conversion of 
adenosine diphosphate ribose l"-2" cyclic phos- 
phate (Appr > p) by CPD^- into adenosine diphos- 
phate ribose 1 -phosphate (Appr-l"-p), of which 
the phosphate group may be further processed by 



SARS-CoV, and five protein families that include enzymes involved in two nuclear RNA processing pathways: intron 
excision to produce mature tRNA^^ and the production of intron-encoded box C/D small nucleolar RNA (snoRNA) 
from its host pre-mRNA (Figure 5).^^ Shown are alignments for key regions of a few selected members of the following 
groups of enzymes: (A) XendoU family; (B) ExoN family; (C) 2'-0-MT fannily; (D) CPD family; and (E) ADRP family. 
These protein families may be known also under other names. Cellular homologs, not necessarily including proteins 
involved in the discussed RNA processing pathways, are listed in the top segment of each aligrunent and nidovirus 
proteins in the bottom segment. In the CPD family, along with group 2 corona virus representatives, proteins of two 
rotaviruses (double-stranded RNA viruses), which were identified in this study, are listed. In both segments, residues 
are highlighted independently: black for absolutely conserved residues and different shades of grey to indicate 
different levels of conservation; amino add similarity groups used were: (i) D, E, N, Q; (ii) S, T; (iii) K, R; (iv) F, W, Y; 
and (v) I, L, M, V Positions occupied by identical or similar residues in all proteins under comparison are indicated 
with an asterisk ( * ) and colon (:), respectively, in the inter-segment row. For the ExoN family, three motifs conserved 
in the DEDD superfamily and Zn-finger unique for die ExoN family are indicated. Database accession numbers for 
nidovirus genome sequences: SARS-CoV, Entrez Genomes accession nvunber NC_004718 (AY274119); MHV-A59, 
NC^001846; BCoV-Lun, AF391542; HCoV-229E, NC_002645; IBV-B, NC.001451; FEDV, NC_003436; TGEV, 
NCl002306; equine torovirus (EToV), X52374; equine arteritis virus (EAV), X53459; porcine reproductive and 
respiratory s)mdrome virus (PRRSV), M96262; gill-assodated virus (GAV), AF227196. Abbrevations and NCBI protein 
database ID number or SwissProt names of the remaining protein sequences are: (A) Npun 0562, hypothetical protein 
of Nostoc punctifanne, ZP_00106190; Poliv smB, pancreatic protein of Pandichthfs olivaceus, BAA88246; Celeg Ppll, 
placental protein 11 -like precursor of Caenorhahditis elegans, NP_492590); Xiaev endoU, endoU protein of Xenopus laevis, 
CAD45344; pplb, ORFlb-encoded part of nidovirus replicase polyprotein lab. (B) Yeast PAN2, PAB-dependent 
poly(A)-specific ribonuclease subunit PAN2 of Saccharomyces cerevisiae, P53010; Mycge DP03, DNA polymerase lH 
polC-type, containing exonuclease domain, of Mycoplasma geriitalium, P47277; Bacsu DING, probable ATP-dependent 
helicase dinG homolog, containing exonuclease domain, of Bacillus subtilis, P54394; Ecoli DP3E, DNA polymerase HI, 
epsilon chain, containing exonuclease domain, of Escherichia coli, P03007 (PDB: 1J53 and 1J54); Ecoli RNT, exoribo- 
nuclease T of Escherichia coli, P30014. (C) Hsap AKA, A-kinase anchoring protein 18 ganuna of Homo sapiens, 
AAF28106; Athal CPDl, putative CPDl of Arabidopsis thaliana, CAA16750; Athal CPD2, putative CPD2 of Arabidopsis 
thaliana, CAA 16751; yeast YG59, hypothetical 26.7 kOa protein of yeast, P53314; Ecoli LIGT, 2'-5' RNA ligase of 
Escherichia coli, P37025; ns2, non-structural protein (ORF2-encoded) of the coronaviruses HCoV-043 (AAA74377), 
BCoV-Quebec (P18517), and MHV-A59 (P19738); EToV ppla, C-terminal fragment of EToV ppla, S11237; HRoV VPS, 
VP3 of human rotavirus, BAA84964; ARoV VPS, VPS of avian rotavirus PO-13, BAA24128. (D) Ecoli ol77, putaHve 
pol>protein of Escherichia coli, AAC74129; Hsap Y1268a, KIAA1268 protem of Homo sapiens, BAA86582; Hsap H2A1.1, 
histone macroH2Al.l of Homo sapiens, AAC33434; yeast YMX7, hypothetical 32.1 kDa protein of yeast, Q04299; yeast 
YBN2, hypothetical 19.9 kDa protein of yeast, P38218. (E) Yeast YBRl, putative ribosomal RNA methyltransferase 
(rRNA (uridine-2'-0-)-methyl transferase) of yeast, P38238; yeast SPBl, putative rRNA methyltransferase SPBl of 
yeast, P25582; yeast YGN6, putative ribosomal RNA methyltransferase YGL136c (rRNA (uridine-2'-0-)-methyltransfer- 
ase) of yeast, P53123; Ecoli FTSJ, cell division protein of Escherichia coli, NP_417646. 
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Figure 5. Nidoviruses encode 
homologs of cellular enzymes 
involved in KNA processing. 
(A) The cellular pathways for pro- 
cessing of pre-U16 snoRNA and 
pre-tR?4A splicing are sununarized, 
with relevant enzymatic activities 
indicated. For details, see the text 
Homologs of the highlighted 
enzymes have been identified in 
nidoviruses (see also Figure 1 and 
the text). (B) Table summarizing 
tlte conservation of homologs of 
the cellular enzymes presimiably 
involved in KNA processing in 
SARS-CoV and different nidovinis 
groups. 



an ADRP."*'** Both these activities may drive the 
production of mature tRNA. Although the nido- 
virus homologs of CPD and ADRP remain to be 
characterized, they are not under the control of 
the ORFla/ORFlb ribosomal frameshift signal 
(Figure 1 ) and may thus, unlike the ORFlb-encoded 
enzymes, be produced in larger quantities. 

The nidovirus homologs of the five RNA proces- 
sing enzymes discussed above may interfere with 
titese or similar cellular RNA processing pathways 
to reprogram the cell for the benefit of virus repro- 
duction. It seems even more conceivable that they, 
alone or in concert with other enzymes like the 
RdRp or helicase, are involved directly in viral 
RNA synthesis, particularly in transcription, 
which, in an apparent parallel with snoRNA- 
driven processes,^^ is guided by conserved oligo- 
nucleotide base-pairing interactions (Figure 3(C)). 
The viral enzymes, like their cellular counterparts, 
might be part of separate pathways or, alterna- 
tively, cooperate in a single pathway in which the 
XendoU, ExoN and 2'-0-MT homologs provide 
RNA specificity, and the CPD and ADRP homologs 
modulate the pace through processing of com- 
poimd(s) containing 2' -phosphate groups. In this 
respect, we note that both the XendoU/ExoN/2'- 
0-MT and CPD/ ADRP cellular pathways start 
with an endoribonucl ease-mediated cleavage to 
produce molecule(s) with 2'-3' -cyclic phosphate 
termini (Figure 5), indicating the structural basis 
for possible cooperation of the coronavirus homo- 
logs of these enzymes in a single pathway. The 
expected functional hierarchy of the five putative 
nidovirus enzymes (Figure 5(A)) is supported by 
their corresponding evolutionary conservation. 



with the XendoU homolog being absolutely 
conserved and the CPD homolog being least con- 
served among nidoviruses (Figure 5(B)). 



Concluding Remarks 

The availability and comparative analysis of the 
SARS-CoV genome and proteome set the stage for 
the extensive biological characterization of this 
emerging pathogen and the development of anti- 
SARS-CoV strategies. (Dur conclusion that SARS- 
CoV is distantly related to group 2 coronaviruses 
(Figure 2) implies that viruses from this group, in 
particular the extensively studied mouse hepatitis 
virus and its derivatives lacking non-essential 
CPD-like and HE genes, may be the best available 
models for both in zntro and in viz^o studies, in par- 
ticular where the synthesis of viral macromolecules 
and the structure and function of the replication 
complex are involved. A detailed comparative 
characterization of the BCoV/HCoV-C)C43 pair 
may provide invaluable insights into the processes 
of adaptation of a non-human coronavirus to a 
human host, which should be highly relevant to 
understanding the emergence of SARS-CoV. The 
SARS-CoV genome (Figure 1) lacks genes that are 
common in group 2 viruses, like PLIp™ and CPD- 
like and HE genes, but encodes a number of 
unique protein sequences, underlining the ability 
of coronaviruses to the gross evolution. The com- 
parative studies presented here have tentatively 
identified both known and novel viral enzymes 
(Figures 1 and 5), most of which may be involved 
in RNA processing and have homologs of which 
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the tertiary structure has been solved (Figure 1). 
Intriguing parallels have been drawn between 
tliese putative viral enzymes and characterized, 
but distant cellular homologs that will guide the 
hinctional dissection of the replicases of SARS- 
CoV and related viruses and may put the mechan- 
ism of coronavirus RNA synthesis in a completely 
new perspective. The newly described putative 
enzymes of SARS-CoV double the list of potential 
targets for the design of antiviral strategies aimed 
at controlling this emerging virus infection.^'-^* 



Materials and Methods 

Analysis of intracellular SARS-CoV RNA 

Vero cells were infected with SARS-CoV (Frankfurt 1 
or HKU-39849) at an MOI of 0,01 or were mock infected. 
At the onset of cytopathogenic effect (approximately 40 
hours post infection), intracellular RNA was isolated by 
cell lysis for ten minutes at room temperature with 5% 
(w/v) lithium dodecyl sulfate in LET buffer (10 mM 
Tris-HCl (pH 7.4), 100 mM LiCl 1 mM EDTA), contain- 
ing 20 M-g/nil of proteinase K. After shearing of the 
cellular DNA using a syringe, lysates were incubated at 
42 "^C for 15 minutes, extracted with phenol (pH 4.0) 
and chloroform, and RNA was ethanol-precipitated. The 
RNAs were separated in denaturing 1% (w/v) agarose 
gels containing 2.2 M formaldehyde and Mops buffer 
(10 mM Mops (sodium salt) (pH 7), 5 mM sodium acet- 
ate, 1 mM EDTA), Dried gels were used for direct 
hybridization with ^^P-labeled oligonucleotides 
SARSVOOl (5'-CGAGGTTGGTTGGCTTTTCCTG-3') and 
SARSV002 (5'-CACATGGGGATAGCACTAC-30, which 
are complementary to sequences in the SARS-CoV leader 
sequence and the genomic 3' end, respectively. After 
hybridization, gels were analyzed using a Personal FX 
Molecular Imager and Quantity One software (both 
from Bio-Rad). 

Methods for bioinformatics 

Genpeptides, Conserved domain (CD)*"^ and protein 
family (Pfam)** databases were used in this study. 
Amino add sequence alignments were generated using 
'ClustalXl.81*^^ and Dialign2**^ programs assisted by 
Blosum position-specific matrices,^^ and were processed 
for presentation using GeneDoc.** Multiple sequence 
alignments were converted into hidden Markov model 
(HMM) profiles using HMMER2.01 software.^ Sequence 
databases were searched in default mode, unless stated 
otherwise, using the HMMER2.01 package.^^^** and a 
family of Blast programs/^ 

The expectation values of similarity (£) of 0.05 or 
lower for Blast searches and 0,1 or lower for HMMER- 
mediated searches were considered to be statistically 
significant/ ^ Database searches with nidovirus proteins 
(Tables 1 and 2) and their alignments were conducted in 
an iterative mode until no new homologs were identi- 
fied. Also, sequences that were identified below the 
threshold during the last iteration were used to initiate 
reciprocal searches that might have resulted in new sig- 
nificant matches. This approach worked for all protein 
families described here, except for tlie identification of 
the relationship between the nidovirus ExoN family and 
cellular DEDD superfamily, which is known to be 
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extremely diverse."*^ In this latter case, using the MAST 
program,^ we found a strong match (/; = 3 e 
between the most conser\'ed motif HI of a DEDD protein 
and a conserved block of the ExoN family that facilitated 
the identification of the two other motifs in the nidovirus 
proteins having a non-typical intermotif spacing par- 
tially occupied by Zn-finger(s) (see the text and 
Figure 4). Furthermore, we observed an approximately 
30 times selective increase of the global similarity 
between the ExoN family and DEDD proteins, after the 
coronavirus sequences were modified artificially by 
removing putative 2ji-fingers that are not present in the 
DEDD proteins. In the HMMER-mediated searches of 
>10^ sequences using this Zn-finger-deficient ExoN 
family as a query numerous DEDD proteins were 
retrieved immediately after the nidovirus proteins, start- 
ing with £ = 0.81. The relatively poor statistics of these 
hits were due to the failure by HMMER to align all 
three motife. 

Cluster phylogenetic trees were reconstiucted using 
the neighbour-joining algorithm described by Saitou & 
Nei^ with the Kimura correction/* and were evaluated 
with 1000 bootstrap trials, as implemented in the Clus- 
talXl.81 program. Parsimonious trees were generated 
using exhaustive search and evaluated with bootstrap 
branch-and-bound search using a UNIX version of the 
PAUP* 4.0.0d55 program that is included in tlie GCG- 
Wisconsin Package programs. The resulting trees were 
visualized using &ie Tree View program.^ 
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